38 research outputs found

    Exploring Bottom-up and Top-down Cues with Attentive Learning for Webly Supervised Object Detection

    Full text link
    Fully supervised object detection has achieved great success in recent years. However, abundant bounding boxes annotations are needed for training a detector for novel classes. To reduce the human labeling effort, we propose a novel webly supervised object detection (WebSOD) method for novel classes which only requires the web images without further annotations. Our proposed method combines bottom-up and top-down cues for novel class detection. Within our approach, we introduce a bottom-up mechanism based on the well-trained fully supervised object detector (i.e. Faster RCNN) as an object region estimator for web images by recognizing the common objectiveness shared by base and novel classes. With the estimated regions on the web images, we then utilize the top-down attention cues as the guidance for region classification. Furthermore, we propose a residual feature refinement (RFR) block to tackle the domain mismatch between web domain and the target domain. We demonstrate our proposed method on PASCAL VOC dataset with three different novel/base splits. Without any target-domain novel-class images and annotations, our proposed webly supervised object detection model is able to achieve promising performance for novel classes. Moreover, we also conduct transfer learning experiments on large scale ILSVRC 2013 detection dataset and achieve state-of-the-art performance

    PGDiff: Guiding Diffusion Models for Versatile Face Restoration via Partial Guidance

    Full text link
    Exploiting pre-trained diffusion models for restoration has recently become a favored alternative to the traditional task-specific training approach. Previous works have achieved noteworthy success by limiting the solution space using explicit degradation models. However, these methods often fall short when faced with complex degradations as they generally cannot be precisely modeled. In this paper, we propose PGDiff by introducing partial guidance, a fresh perspective that is more adaptable to real-world degradations compared to existing works. Rather than specifically defining the degradation process, our approach models the desired properties, such as image structure and color statistics of high-quality images, and applies this guidance during the reverse diffusion process. These properties are readily available and make no assumptions about the degradation process. When combined with a diffusion prior, this partial guidance can deliver appealing results across a range of restoration tasks. Additionally, PGDiff can be extended to handle composite tasks by consolidating multiple high-quality image properties, achieved by integrating the guidance from respective tasks. Experimental results demonstrate that our method not only outperforms existing diffusion-prior-based approaches but also competes favorably with task-specific models.Comment: GitHub: https://github.com/pq-yang/PGDif

    3D Unet-based Kidney and Kidney Tumer Segmentation with Attentive Feature Learning

    Get PDF
    To study the kidney diseases and kidney tumor from Computed Tomography(CT) imaging data, it is helpful to segment the region of interest through computer aided auto-segmentation tool. In the KiTs 2019 challenge [1], we are provided 3D volumetric CT data to train a model for kidney and kidney tumor segmentation. We introduce an improved deep 3D Unet by enriching the feature representation in CT images using an attention module. We achieve 1.5% improvement in the segmentation accuracy when evaluated on the validation set

    Sterically Induced Binding Selectivity of Single m-Terphenyl Isocyanide Ligands

    Full text link
    Sterically encumbering m-terphenyl isocyanides are a class of metal-binding group that foster low-coordinate metal-center environments in coordination chemistry by exerting considerable intermolecular steric pressures between neighboring ligands. In the context of metal surfaces, the encumbering steric properties of the m-terphenyl isocyanides are shown to weaken the interaction between the metal-binding group and a planar substrate, leading to a preference for molecular adsorption at sites with convex curvature, such as the step edges and herringbone elbow sites on Au(111). Here, we investigate the site-selective binding of individual m-terphenyl isocyanide ligands on a Au(111) surface through scanning tunneling microscopy (STM) and inelastic electron tunneling spectroscopy (IETS). The site-dependent steric pressure alters the vibrational fingerprint of the m-terphenyl isocyanides, which is characterized with single-molecule precision through joint experimental and theoretical approaches. This study for the first time provides molecular-level insights into the steric-pressure-enabled surface binding selectivity as well as its effect on the chemical properties of individual m-terphenyl isocyanide ligands, thereby highlighting the potential to control the physical and chemical properties of metal surfaces through tailored ligand design

    Dual Semantic Fusion Network for Video Object Detection

    Full text link
    Video object detection is a tough task due to the deteriorated quality of video sequences captured under complex environments. Currently, this area is dominated by a series of feature enhancement based methods, which distill beneficial semantic information from multiple frames and generate enhanced features through fusing the distilled information. However, the distillation and fusion operations are usually performed at either frame level or instance level with external guidance using additional information, such as optical flow and feature memory. In this work, we propose a dual semantic fusion network (abbreviated as DSFNet) to fully exploit both frame-level and instance-level semantics in a unified fusion framework without external guidance. Moreover, we introduce a geometric similarity measure into the fusion process to alleviate the influence of information distortion caused by noise. As a result, the proposed DSFNet can generate more robust features through the multi-granularity fusion and avoid being affected by the instability of external guidance. To evaluate the proposed DSFNet, we conduct extensive experiments on the ImageNet VID dataset. Notably, the proposed dual semantic fusion network achieves, to the best of our knowledge, the best performance of 84.1\% mAP among the current state-of-the-art video object detectors with ResNet-101 and 85.4\% mAP with ResNeXt-101 without using any post-processing steps.Comment: 9 pages,6 figure

    TableGPT: Towards Unifying Tables, Nature Language and Commands into One GPT

    Full text link
    Tables are prevalent in real-world databases, requiring significant time and effort for humans to analyze and manipulate. The advancements in large language models (LLMs) have made it possible to interact with tables using natural language input, bringing this capability closer to reality. In this paper, we present TableGPT, a unified fine-tuned framework that enables LLMs to understand and operate on tables using external functional commands. It introduces the capability to seamlessly interact with tables, enabling a wide range of functionalities such as question answering, data manipulation (e.g., insert, delete, query, and modify operations), data visualization, analysis report generation, and automated prediction. TableGPT aims to provide convenience and accessibility to users by empowering them to effortlessly leverage tabular data. At the core of TableGPT lies the novel concept of global tabular representations, which empowers LLMs to gain a comprehensive understanding of the entire table beyond meta-information. By jointly training LLMs on both table and text modalities, TableGPT achieves a deep understanding of tabular data and the ability to perform complex operations on tables through chain-of-command instructions. Importantly, TableGPT offers the advantage of being a self-contained system rather than relying on external API interfaces. Moreover, it supports efficient data process flow, query rejection (when appropriate) and private deployment, enabling faster domain data fine-tuning and ensuring data privacy, which enhances the framework's adaptability to specific use cases.Comment: Technical Repor

    Learning to recognize objects by adaptive knowledge transfer

    No full text
    When humans learn new knowledge and skills, we can naturally transfer them to other domains. Along with the learning procedures, we learn knowledge and skills for certain tasks and transfer them to similar tasks; we also can use the old knowledge to facilitate the learning of new knowledge. While effective knowledge transfer is a congenital and important learning ability of humans, it is not easy for machine learning mechanisms to adopt the ability of knowledge transfer. In recent years, there are plenty of works studying transfer learning in deep learning. There are still some practical challenges that remain undiscovered, especially under different problem settings encountered in real situations. In this thesis, we explore how to adopt knowledge transfer mechanisms in deep learning approaches in several practical scenarios. Four different works are proposed to study knowledge transfer across different domains and tasks via domain adaptation and model transfer. In particular, we study the web knowledge transfer for object detection task by adapting the web data to the real target dataset, which aims at reducing the human annotation effort for training object detector. In the incremental learning scenario, we study the cross-utilization of the old and new knowledge to overcome the catastrophic forgetting during the incremental and progressive learning process. Lastly, we explore transfer learning in the medical imaging domain by transferring the model pre-trained on normal images. Overall, the major contributions are summarized as follows: - A web knowledge transfer method to enhance the learning of weakly supervised object detection. The proposed method includes an effective web data collection pipeline and a curriculum learning scheme to achieve more effective model optimization during multi-instance learning. - An annotation-effective object detection method by adapting web data to the target data for object detection. This work attempts to learn an object detector from web supervision by adversarial domain adaptation. - An incremental learning scheme that adapts an old model to a new model without forgetting the old knowledge. A systematic study is performed to explore different class incremental methods. Furthermore, we propose a graph-based method to mine the old sample forgettability along with the training of the new tasks, and dynamically select samples that are more forgettable to overcome the catastrophic forgetting. - A lesion detection method for 3D CT images by utilizing model weights that are pre-trained on normal 2D RGB images. Furthermore, an attention-based feature aggregation method is proposed to adaptively transfer the information from neighboring slices to the key slices for more discriminative representation. Through this thesis, we demonstrate three different paradigms of knowledge transfer, including (1) the cross-domain knowledge transfer for adapting web data to an application with real unconstrained data, (2) the continual knowledge transfer from old tasks to new tasks without forgetting the old knowledge, and (3) the model transfer from the normal image domain to the medical domain. Under several practical tasks, the experiments are conducted to demonstrate the effectiveness of the proposed knowledge transfer approaches.Doctor of Philosoph
    corecore